Cortex-M backend: Add quantized int8 batch matmul (CMSIS-NN)#17799
Cortex-M backend: Add quantized int8 batch matmul (CMSIS-NN)#17799rascani wants to merge 1 commit intopytorch:mainfrom
Conversation
Add cortex_m::quantized_batch_matmul wrapping arm_batch_matmul_s8. The RHS is always pre-transposed: constant RHS (parameters) are transposed at AOT time in the pass, dynamic RHS get a cortex_m::transpose node inserted in the graph. Authored with Claude.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17799
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 Awaiting Approval, 3 New FailuresAs of commit cb40758 with merge base 25f2a3f ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
|
Any thoughts @AdrianLundell? Not sure if you had an alternate approach to bmm in mind. |
Summary
Add cortex_m::quantized_batch_matmul wrapping arm_batch_matmul_s8. The RHS is always pre-transposed: constant RHS (parameters) are transposed at AOT time in the pass, dynamic RHS get a cortex_m::transpose node inserted in the graph.
It would be preferable if we could pre-compute or cache the constant RHS kernel sums, but I could not find any public CMSIS-NN APIs that would allow us to do so.
Fixes #16109
Authored with Claude.
Test plan